Skip to content

feat: implement merge multiple DVs#708

Open
HuaHuaY wants to merge 2 commits into
apache:mainfrom
HuaHuaY:impl_merge_dv
Open

feat: implement merge multiple DVs#708
HuaHuaY wants to merge 2 commits into
apache:mainfrom
HuaHuaY:impl_merge_dv

Conversation

@HuaHuaY

@HuaHuaY HuaHuaY commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

No description provided.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements support for merging multiple deletion vectors (DVs) that reference the same data file by rewriting them into merged Puffin DV blobs, enabling format v3 workflows that previously returned NotImplemented.

Changes:

  • Add DV-merge pipeline in MergingSnapshotUpdate that unions multiple DV bitmaps per referenced data file and writes merged DV blobs into a Puffin container.
  • Introduce TransactionContext::NewDataLocation(...) to generate table-consistent data-file output locations for merged DV Puffin files.
  • Promote Puffin + RoaringPositionBitmap components from iceberg_data into the core iceberg library (exports + build wiring) and add a test covering DV merging.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/iceberg/update/merging_snapshot_update.h Adds DV merge APIs and tracking for merged DV outputs.
src/iceberg/update/merging_snapshot_update.cc Implements DV merging via Puffin reader/writer and bitmap union; adds cleanup handling for merged DV outputs.
src/iceberg/transaction.h Adds TransactionContext::NewDataLocation API.
src/iceberg/transaction.cc Implements NewDataLocation via LocationProvider.
src/iceberg/test/merging_snapshot_update_test.cc Adds an integration-style test validating duplicate DV merging behavior and correctness.
src/iceberg/puffin/puffin_writer.h Moves export macro to core ICEBERG_EXPORT.
src/iceberg/puffin/puffin_reader.h Moves export macro to core ICEBERG_EXPORT.
src/iceberg/puffin/puffin_format.h Moves export macro to core ICEBERG_EXPORT.
src/iceberg/puffin/json_serde_internal.h Moves export macro to core ICEBERG_EXPORT.
src/iceberg/puffin/file_metadata.h Moves export macro to core ICEBERG_EXPORT.
src/iceberg/deletes/roaring_position_bitmap.h Moves export macro to core ICEBERG_EXPORT.
src/iceberg/meson.build Moves Puffin + bitmap sources/deps into iceberg lib; adjusts croaring linkage.
src/iceberg/CMakeLists.txt Moves Puffin + bitmap sources into iceberg lib; adjusts croaring linkage propagation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +774 to +780
ICEBERG_ASSIGN_OR_RAISE(
auto merged,
MergeDVsForReferencedFile(referenced_file, dvs, *writer, merged_dv_path));
merged_files.push_back(merged.file);
merged_dvs_.push_back(merged);
result.push_back(std::move(merged));
continue;
Comment on lines +821 to +825
ICEBERG_ASSIGN_OR_RAISE(auto input,
ctx_->table->io()->NewInputFile(dv.file->file_path));
ICEBERG_ASSIGN_OR_RAISE(auto reader,
puffin::PuffinReader::Make(std::move(input), std::nullopt,
dv.file->file_size_in_bytes));
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants